This is a heterogeneous cell population

This is a heterogeneous cell population


Methylation pattern reconstruction problem

methylFlow: http://github.com/hcorrada/methylFlow

[Bioinformatics, in press]

Methylation pattern reconstruction problem


Methylation pattern reconstruction problem


Methylation pattern reconstruction problem


The statistic: number of reads in genomic region


The model: expected number of reads in genomic region

\[ \mathbb{E} y_v = \sum_{u:(v,u) |in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \]


The estimator

\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]


How to solve efficiently

\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]

If we interpret abundance as path flow, then we can rewrite in terms of edge flows

\[ f_{vu} = \sum_{p:(v,u) \in p} \theta_p \]


How to solve efficiently

\[ \min_{\theta_p} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} \sum_{p:(v,u)\in p} \theta_p \rvert + \lambda \sum_p \lvert \theta_p \rvert \]

If we interpret abundance as path flow, then we can rewrite in terms of edge flows

\[ f_{vu} = \sum_{p:(v,u) \in p} \theta_p \]

\[ \min_{f \geq 0} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} f_{vu} \rvert + \lambda f_{vt} \\ \textrm{s.t} \sum_{u:(v,u) \in E} f_{vu} = \sum_{w:(w,v) \in E} f_{wv} \]


How to solve efficiently

\[ \min_{f \geq 0} \sum_v \lvert y_v - \sum_{u:(v,u)\in E} \ell_{vu} f_{vu} \rvert + \lambda f_{vt} \\ \textrm{s.t} \sum_{u:(v,u) \in E} f_{vu} = \sum_{w:(w,v) \in E} f_{wv} \]


Pattern reconstruction from whole genome bisulfite sequencing

Dataset of 50bp reads from mouse wild-type activated B cells, two types of progenitor cells (CLP and KSL).

Reconstruct patterns 4-100x basepair length


Pattern reconstruction from whole genome bisulfite sequencing

Reconstruct patterns with accurate marginal estimates


Pattern reconstruction from targeted bisulfite sequencing

Compare patterns across samples and populations

Moving Forward

Cell-specific methylation pattern reconstruction